Unit selection in a concatenative speech synthesis system using a large speech database
نویسندگان
چکیده
One approach to the generation of natural-sounding synthesized speech waveforms is to select and concatenate units from a large speech database. Units (in the current work, phonemes) are selected to produce a natural realisation of a target phoneme sequence predicted from text which is annotated with prosodic and phonetic context information. We propose that the units in a synthesis database can be considered as a state transition network in which the state occupancy cost is the distance between a database unit and a target, and the transition cost is an estimate of the quality of concatenation of two consecutive units. This framework has many similarities to HMM-based speech recognition. A pruned Viterbi search is used to select the best units for synthesis from the database. This approach to waveform synthesis permits training from natural speech: two methods for training from speech are presented which provide weights which produce more natural speech than can be obtained by hand-tuning.
منابع مشابه
Data-driven Segment Pres Trainable Speech Syn
Unit selection based concatenative speech synthesis has proven to be a successful method of producing high quality speech output. However, in order to produce high quality speech, large speech databases are required. For some applications, this is not practical due to the complexity of the database search process and the storage requirements of such databases. In this paper, we propose a data-d...
متن کاملNew Developments in Data-Driven Concatenative Sound Synthesis
Concatenative data-driven synthesis methods based on a large database of sounds and a unit selection algorithm are gaining more interest in the computer music world. We briefly describe recent related work and then focus on new developments in our CATERPILLAR synthesis system: the advantages of the addition of a relational SQL database, work on segmentation by alignment, the reformulation and e...
متن کاملA System for Data-driven Concatenative Sound Synthesis
In speech synthesis, concatenative data-driven synthesis methods prevail. They use a database of recorded speech and a unit selection algorithm that selects the segments that match best the utterance to be synthesized. Transferring these ideas to musical sound synthesis allows a new method of high quality sound synthesis. Usual synthesis methods are based on a model of the sound signal. It is v...
متن کاملDiphone synthesis using unit selection
This paper describes an experimental AT&T concatenative synthesis system using unit selection, for which the basic synthesis units are diphones. The synthesizer may use any of the data from a large database of utterances. Since there are in general multiple instances of each concatenative unit, the system performs dynamic unit selection. Selection among candidates is done dynamically at synthes...
متن کاملProsody-based unit selection for Japanese speech synthesis
A corpus-based concatenative speech synthesis system using no signal processing can produce intelligible synthetic speech maintaining original voice characteristics. In such a concatenative system, it is very important to select appropriate waveform segments that are naturally close to the target prosody. But with a limited size database it can sometimes be di cult to realize natural prosody. T...
متن کاملThe Caterpillar System for Data-driven Concatenative Sound Synthesis
Concatenative data-driven synthesis methods are gaining more interest for musical sound synthesis and effects. They are based on a large database of sounds and a unit selection algorithm which finds the units that match best a given sequence of target units. We describe related work and our CATERPILLAR synthesis system, focusing on recent new developments: the advantages of the addition of a re...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996